seo

Google Search Bugs

Dear Friends:

I published a website earlier this year on the American folk song, Follow the Drinking Gourd. I closely tracked how the three major search engines handled the site, including how quickly the site was ranked, how many pages each of the three major engines found, how often they visited these pages, and how fast they dropped stale content from their rankings. (Details here.)

Along the way, I couldn’t help but notice some striking anomalies in the Google results. I am no SEO maven, so it could be that these anomalies (hiccups/ghosts in the machine/BUGS!) are well known to this site’s readership. In case they aren’t, here goes with three different types. In increasing order of user impact, they are: mishandling punctuation, excessively volatile search results, and documents that are supposedly cached but actually are not fully searchable. First, the punctuation.

A PANDA WALKS INTO A BAR…

Google searches for exact phrases that span punctuation marks, such as exclamation points, ellipses and en-dashes, often fall apart. I provide some examples that occur on my site, but my favorite actually comes from a sentence in Search Engine Watch praising Yahoo! On May 2nd, 2007 Eric Enge wrote, “On another note, I really like the way Yahoo! is playing this game right now.” (Note the exclamation point, part of Yahoo!’s official name, but mishandled by Google.) You may search on any phrase in this sentence (in quotes) up to and including “Yahoo!” and it will be found. Include “Yahoo!” and any words to the right in a search and the string will not be found. Leave “Yahoo!” off and include only words to the right…and Google will find the document again. All the searches described in this section that bamboozle Google prove no problem for Yahoo! and MSN. (See here for more information.)

AT WHAT POINT DOES “VOLATILE” = “BUGGY”???

I am certainly not the first person to comment on the volatility of search results. Two leading search engine commentators take a valiant crack at explaining it here (Stoney deGeyter) and here (Rand Fishkin). They rightly focus on factors such as when a site’s content is first cached, when in-bound links to that site were first discovered, changes at competitive sites, and so on.

Still, perhaps there is an elephant in the room? If other large-scale software systems exhibited similar oscillations, we’d come right out and say they were, at least in part, buggy and unpredictable.

Using my site as a barometer, on April 6th, we ranked 26th for a Google search on follow the drinking gourd.  The next afternoon, it was 44th. The next evening, it was 29th. These are fairly volatile shifts for an obscure corner of the web, one where the other relevant pages had changed very little (if at all) for quite some time.

When I compared the rankings on April 6th with those the afternoon of April 7th, the average among the top 50 sites was a small change in rank of just two points. My site’s 18 place swing was twice that of any other site in the top 50. (See the details here.)

In fact, we bounced randomly between 25th and 44th for nearly ten weeks (details here). I am happy to say we are now ranked number one – perhaps Google was just waiting until we were “baked to perfection.”

THE CURIOUS CASE OF THE COUNTERFEIT CACHE

To track where my content shows up on the Web, I set up a Google Alert with one phrase from each page in this site. (There was nothing unusual or tricky about these phrases, they just had to be unique on the Web so I wouldn’t be bombarded with email Alerts.) I searched on these same phrases every few weeks. When I did, I found that a surprising percentage of phrases from pages that had been “cached” were nevertheless not found in a Google search. In other words, it was possible to navigate to a page from this site in the Google cache, select a unique string of text from that page, and then search for that same text in Google – and Google would not find it.

As covered by Eric Enge here, the page Interpretation_Over_The_Last_Ten_Years.htm was first cached by Google on January 31st. It contains the phrase, “formed the narrative core of a planetarium show.” The page was cached again on March 16th. A search for the test phrase that same day came up empty. The page was found in searches on March 18th and March 31st. On April 1st, Google cached the page again. A search on April 24th failed, but by May 1st, this search worked again! (See here for screen shots and more examples.) 

I prepared a chart showing how scattershot the Google search results proved over time (see it here). Fully 19 of the site’s 30 pages have been affected at some point. By contrast, when either Yahoo or MSN cached pages, their contents were overwhelmingly searchable in their entirety. It would take a thorough analysis, not a small hand sample, to determine how widespread this Google search bug might be. It’s very possible that more pages are affected. I consider this a severe reliability problem for Google, and for websites that rely on the Google search box to enable users to find content on their site.

I look forward to your comments.

Joel Bresler, Independent Researcher

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button